Beyond Gaussians: Spectral Methods for Learning Mixtures of Heavy-Tailed Distributions

نویسندگان

  • Kamalika Chaudhuri
  • Satish Rao
چکیده

We study the problem of learning mixtures of distributions, a natural formalization of clustering. A mixture of distributions is a collection of distributions D = {D1, . . . , DT } and weights w1, . . . , wT . A sample from a mixture is drawn by selecting Di with probability wi and then selecting a sample from Di. The goal, in learning a mixture, is to learn the parameters of the distributions comprising the mixture, given only samples from the mixture. In this paper, we focus on learning mixtures of heavy-tailed product distributions, which was studied by [DHKS05]. The challenge in learning such mixtures is that the techniques developed for learning mixture-models, such as spectral methods and distance concentration, do not apply. The previous algorithm for this problem was due to [DHKS05], which achieved performance comparable to the algorithms of [AM05, KSV05, CR08] given a mixture of Gaussians, but took time exponential in the dimension. We provide an algorithm which has the same performance, but runs in polynomial time. Our main contribution is an embedding which transforms a mixture of heavy-tailed product distributions into a mixture of distributions over the hypercube in a higher dimension, while still maintaining separability. Combining this embedding with standard spectral techniques results in algorithms that can learn mixtures of heavy-tailed distributions with separation comparable to the guarantees of [DHKS05]. Our algorithm runs in time polynomial in the dimension, number of clusters, and imbalance in the weights.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Spectral Learning of Mixtures of Distributions

We consider the problem of learning mixtures of distributions via spectral methods and derive a tight characterization of when such methods are useful. Specifically, given a mixture-sample, let μi, Ci, wi denote the empirical mean, covariance matrix, and mixing weight of the i-th component. We prove that a very simple algorithm, namely spectral projection followed by single-linkage clustering, ...

متن کامل

A Spectral Algorithm for Learning Mixtures of Distributions

We show that a simple spectral algorithm for learning a mixture of k spherical Gaussians in Rn works remarkably well — it succeeds in identifying the Gaussians assuming essentially the minimum possible separation between their centers that keeps them unique (solving an open problem of [1]). The sample complexity and running time are polynomial in both n and k. The algorithm also works for the m...

متن کامل

The Spectral Method for General Mixture Models

We present an algorithm for learning a mixture of distributions based on spectral projection. We prove a general property of spectral projection for arbitrary mixtures and show that the resulting algorithm is efficient when the components of the mixture are logconcave distributions in n whose means are separated. The separation required grows with k, the number of components, and logn. This is ...

متن کامل

Tail Dependence for Heavy-Tailed Scale Mixtures of Multivariate Distributions

The tail dependence of multivariate distributions is frequently studied via the tool of copulas. This paper develops a general method, which is based on multivariate regular variation, to evaluate the tail dependence of heavy-tailed scale mixtures of multivariate distributions, whose copulas are not explicitly accessible. Tractable formulas for tail dependence parameters are derived, and a suff...

متن کامل

A Study of Skewed Heavy-tailed Distributions as Scale Mixtures

In this paper, we study and compare different proposals of heavy-tailed (possibly skewed) distributions as robust alternatives to the normal model. The density functions are all represented as scale mixtures which enables efficient Bayesian estimation via Markov chain Monte Carlo (MCMC) methods. However, while the symmetric versions of these distributions are able to model heavy tails they of c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008